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Abstract. We compute all irregular primes less than 163 577356. For all of 
these primes we verify that the Kummer— Vandiver conjecture holds and that 
the A-invariant is equal to the index of irregularity. 



1. Introduction 

Bernoulli numbers are deeply intertwined with the the arithmetic of cyclotomic 
fields. To explain a basic case, consider the cyclotomic field K = Q(Cp) generated 
by a p-th root of unity ( p = e 27T1 /P, p prime. Let G = Gal(K/Q) ~ (Z/pZ) x be its 
Galois group, and let u> : G — > Z* be the unique p-adic character for which 

io(a) = r mod p 

where cr(Cp) = Cp- Then the p-Sylow subgroup A of the class group decomposes 
into components 

A= A k 

0<fc<p-l 

where Ak is the Z p [G]-submodule on which a acts by multiplication by w k (a). (For 
details, see |Was97j .) 

It is conjectured that Ak is trivial if k is even; this is equivalent to the Vandiver 
(or Kummer- Vandiver) conjecture that the class number h£ of the maximal real 
subfield Q(C P + Cp 1 ) is n °t divisible by p. By combining results of Herbrand and 
Ribet we know that A p _k is nontrivial if and only if the numerator of the fc-th 
Bernoulli number is divisible by p, and the Main Conjecture, proved by Mazur and 
Wiles, shows that the order of A p _k is equal to the power of p that divides Bk- 
Thus, the first step in finding the p-part of the class group of K is to study the 
numerators of Bernoulli numbers. 

If k is an even integer with < k < p — 1, then fc is said to be an irregular 
index for p if p divides the numerator of the fc-th Bernoulli number Bk- The index 
of irregularity of p, denoted i p , is the number of such irregular indices k, and p is 
irregular if i p is positive. If k is irregular for p then (p, k) is said to be an irregular 
pair. Once the irregular pairs for a given p have been found it is possible to make 
further calculations verifying the Kummer- Vandiver conjecture, and to calculate 
the corresponding A-invariants in Iwasawa theory; these invariants A p measure the 
rate of growth of the p-part of the class group of p-power cyclotomic fields. 

This paper describes a calculation of all irregular pairs for p less than 39 • 2 22 = 
163 577356. The computation consumed more than twenty years of CPU time, 
and the quick summary of the results is that nothing surprising happened. The 
Kummer- Vandiver conjecture is true for all of these primes, and the A-invariant 
is always as small as it can be, i.e., X p — i p . This implies that the class group of 
Q(C P ") is, for all of these p, isomorphic to (Z/p™Z)V We did find one new prime 
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with i p = 7, namely p = 32 012 327. At the moment that we write this, the data 
supporting these calculations can be found at |Har09aj . 

Irregular indices have been computed many times over the last 160 years, starting 
with Kummer's hand calculations for p < 163, which he used to verify Fermat's 
Last Theorem and that hp is prime to p. This was extended to p < 619 early in 
the twentieth century by Vandiver and his colleagues, using desk calculators (and 
graduate students). Vandiver and the Lehmers, making one of the first number- 
theoretic calculations on electronic computers, extended this to p < 4001 by 1956. 
For a discussion of these early results, see |Cor08j . A series of further results by the 
Lehmers, Iwasawa, Sims, Kobelev, Johnson, Nicol, Pollack, Selfridge, Tanner, and 
Wagstaff extended these to p < 150 000 by 1987 (see |Joh75j . |Wag78| , or |TW87j for 
a summary and further references). In the early 1990's, Buhler, Crandall, Ernvall, 
Metsankyla, Sompolski, and Shokrollahi used "asymptotically fast" algorithms to 
go much further, and by 1999 the computations were extended to all primes less 
than 12 million |BCS92| . [BCEM93] . jBCE+01] . |EM91j . [EM92j . 

This paper describes our extension of these results. The two major reasons that 
we were able to extend the upper bound on p by a factor of almost 14 were: (a) the 
use of two large clusters at the Texas Advanced Computing Center at the University 
of Texas at Austin, and (b) new algorithms and implementations of them, due to 
the second author, for polynomial arithmetic over finite fields. 

The paper is organized as follows: we first describe the algorithms for calculating 
Bk mod p for < k < p— 1, algorithms for performing the needed polynomial arith- 
metic, and the algorithms for the Vandiver and cyclotomic invariant calculations. 
Then we summarize the results, including a description of the hardware that we 
used and a discussion of the steps we took to ensure correctness. 

2. Bernoulli numbers modulo p 
Bernoulli numbers are defined by a formal power series inversion 

e*-l ~ ^ k kV 

k>0 

We have B\ = 1/2, and all other odd Bernoulli numbers B3, B$, . . . are zero. The 
first, and by far and away the most time-consuming phase of our calculations is the 
computation of Bk mod p, < k < p — 1, k even, for each p in turn. 

We used two different algorithms, the Voronoi identity method and the power 
series method. Both depend crucially on asymptotically fast polynomial arithmetic 
over Z/pZ. The theoretical complexity is 0(p\og 2+£ p) for both methods, but the 
implied constants and memory footprints are different — the algorithm using the 
Voronoi identity seems to be faster but requires more memory. We first describe the 
two methods, and then discuss the algorithms used for the underlying polynomial 
arithmetic. 

The method based on the Voronoi identity is essentially identical to the 'root 
finding method' of [BCE+Ol] : we used it for all p < 21 • 2 22 = 88 8 3 84. Let g be 
a generator of (Z/pZ) x . Then for 2 < k < p — 1, k even, we have 



(1) 



B k 



2k 
1-.9* 



(p-3)/2 



i=0 
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where 



h(x) 



(x mod p) — g{x/ g mod p) g — 1 




This is a version of Voronoi's congruence with the terms arranged in multiplicative 
order; see |Sla87| . [Van 17] (or [DSS91) for a Bernoulli number bibliography, with 
an updated online version at |Dil09j V The sum on the right hand side may be 
regarded as the Fourier transform of i i— » h{g l )/g l with respect to the roots of unity 
{g 2k }^=o^ 2 ■ Using Bluestein's FFT algorithm |Blu70j . we convert the problem of 
simultaneously computing all to the problem of multiplying two polynomials of 
length [p— 1)/2 over Z/pZ, plus 0(p) pre-processing and post-processing operations 
in Z/pZ. 

The power series method is described in }BCS92] . We used the variant based on 
the identity 



where D and the Ai are power series whose coefficients are given by simple recur- 
rence formulae BCS92, p. 719]. This reduces to a power series inversion of length 
p/8 + 0(1), and four series multiplications of length p/8 + 0(1), over Z/pZ. This 
method uses less memory than the Voronoi identity method, and we used it for all 
primes 21 • 2 22 < p < 39 • 2 22 . 



Most of the algorithms for polynomial and power series arithmetic described here 
are implemented in the zn_poly library of the second author [Har08b . 

In what follows, we consider arithmetic in R[x], including arithmetic on truncated 
power series in i?[x], where R is a coefficient ring in which 2 is not a zero-divisor. 
For the algorithms described above, we could take R = Z/pZ, where p is the 
prime of interest. In our implementation, we found it profitable to handle two 
nearby primes simultaneously, taking instead R — Z/pip 2 Z, and using the Chinese 
Remainder Theorem. This was possible since the computations were run on 64-bit 
hardware, and the largest prime considered was less than 32 bits long. 

Our large polynomial multiplication routine involves a combination of algo- 
rithms. The outermost layer reduces the multiplication of polynomials f,g 6 
R[x] of length n to 0(n 1 ^ 2 ) negacyclic convolutions of length (^(n 1 / 2 ), using an 
adaptation to polynomials of the Schonhage-Strassen integer multiplication algo- 
rithm |SS71j suggested by Schonhage |Sch77j . This proceeds by splitting the in- 
puts into segments of length M/2 and embedding the multiplication problem into 
R[y, z\j (y M + 1, z K — 1) via the inverse of the substitution y \— > x, z i— > x M / 2 , where 
M = 2r( 1 + 1 °g 2 «)/2l = o( n i/2) and K = 2 riog 2 (4n/M)l = o( n V2). These conditions 

ensure that MK > An (the product 'fits' into the bivariate quotient ring) and that 
K | 2M. The bivariate multiplication may then be effected by using FFTs over 
the ring R[y]/(y M + 1) with respect to the primitive K-th root of unity y 2M / K . 
These FFTs involve only additions and subtractions in R, plus some book-keeping 
to keep track of powers of y. To improve memory locality and smoothness of the 
running time with respect to n, we use a cache-friendly version Har09b] of van der 
Hoeven's truncated Fourier transform (TFT) method [vdH04, vdH05j . 




3. Polynomial and power series arithmetic 



4 



J. P. BUHLER AND D. HARVEY 



For the negacyclic convolutions (recursive multiplications in R[y]/(y M + 1)) we 
use Nussbaumer's algorithm Nus80 for sufficiently large M; this follows the same 
general plan as Schonhage's algorithm sketched above, but uses a slightly different 
substitution that can reduce the size of the Fourier coefficients by a factor of two 
in some cases. For sufficiently small M, we switch to a variant of the Kronecker 
substitution method Har09c , which packs the input polynomials into integers and 
multiplies the resulting integers. 

Our power series inversion algorithm is based on a Newton iteration. We fol- 
lowed the approach suggested in HQZ04, Algorithm MP-inv, p. 421], which takes 
advantage of the middle product to improve the running time constant. A minor 
improvement in our algorithm is that we reuse the Fourier transform of a in lines 
3 and 4 of |HQZ04| , reducing the theoretical cost by a further 1/6. To transport 
the smoothness benefits of the truncated FFT from the ordinary product to the 
middle product, it was necessary to implement a transposed TFT and inverse TFT 
over 

R[y]/{y M + 1), making use of the "transposition principle" [BLS03J . Unlike 
the usual (non-truncated) FFT, the matrices for the TFT and inverse TFT are not 
symmetric, so genuinely new code was required. 

It is worth pointing out that our polynomial arithmetic routines are based en- 
tirely on integer arithmetic; we did not use floating point arithmetic at all. 

Memory constraints became formidable for those p near the top of the range 
that we handled using the Voronoi identity method. Just to store two polynomials 
of length p/2 and their product, where p » 21 ■ 2 22 , requires 1.4 GB of memory, 
assuming a 64-bit word for each coefficient. Each thread had only 2 GB RAM 
available (see description of the hardware below), leaving very little auxiliary work- 
ing storage, certainly much less than is required to store the FFT coefficients. To 
overcome this, we implemented a discrete Fourier transform variant that uses a 
naive quadratic-time DFT algorithm to handle the first few layers of the transform, 
and then the standard O(nlogn) algorithm for the remaining layers. The resulting 
time-space tradeoff was acceptable up to some limit, but became infeasible as we 
approached the 2 GB barrier, prompting us to switch to the less memory-intensive 
power series method. 



4. Kummer-Vandiver and cyclotomic invariants 

We quickly summarize some well-known facts about cyclotomic class groups, 
using notation established above; for details see |Was97] . If p is prime and k is even, 
< k < p — 1, then Ap^k is nontrivial if and only if k is irregular for p. Moreover, 
a "reflection theorem" implies that if A^ is nontrivial then A p -k is nontrivial, i.e., 
k is irregular for p. The component A^ has the same order as the w fe -component of 
the p-Sylow subgroup of Uk/ Cyc K , where Uk — O k is the group of units inside 
the ring of integers of K, and Cyc^ denotes the group of cyclotomic units. The 
latter can be described explicitly, and Vandiver's conjecture for p, which says that 
Ak is trivial for all even k, can be proved by showing that the cyclotomic unit u/~ 
that generates this component does not have a p-th root lying in K. 

This can be established by finding a prime ideal Q of K that has residue class 
degree 1, for which Uk is not a p-th. power modulo Q. This reduces (again, for 
details see W as 97 ) to showing that it is possible to find a rational prime q with 
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q = l mod p such that V pk ^ 1 mod q, where 

(P-1V2 

V P ,k= II (^-^ C f^ k (modq). 

c=l 

Here z is a p-th root of unity modulo q, and Q is one of the primes lying over q. 
Thus, Vandiver's conjecture for p is proved if the above condition is checked for 
each irregular index k for p. We performed this computation for every irregular 
pair in our table, and in each case the above condition held, and in all cases the 
smallest prime q = 1 mod p worked. 

To verify that the A-invariant A p is equal to i p it suffices to check that certain 
congruences between Bernoulli numbers do not hold. These reduce to elementary 
congruences EM91J. To describe these, let 

(p-l)/2 

S(e)= ]T a e . 

a=l 

For each irregular index k for a given prime p we check that 

2 fe ^ 1 mod p, 

S{k -l)^S(p + k-2) mod p 2 , 

kS(k - 1) ^ (k - l)S(p + k-2) mod p 2 . 

These congruences imply that Bk ^ mod p 2 , and if they hold for all irregular 
indices k for p then A p = i p . In the case that 2 k = 1 mod p, this test can be 
replaced by another one; for details on this see |EM9 1 (though note that our 
notation is different). 

5. Results 

There are N = 9163 831 primes up to 39 • 2 22 = 163 5 7 7 3 56. The indices of 
irregularity i p ranged up to 7, and the number of primes Ni of index i are tabulated 
below. 



i 




Ni/N 


Pi 


N-pt 





5 559 267 


0.6067 


0.6065 


5 558144 


1 


2 779 293 


0.3033 


0.3032 


2 779072 


2 


694218 


0.0758 


0.0758 


694 768 


3 


115 060 


0.01256 


0.01263 


115 794 


4 


14 425 


0.00157 


0.00158 


14474 


5 


1451 


0.000158 


0.000158 


1447 


6 


112 


0.000012 


0.000013 


120 


7 


5 


0.00000055 


0.00000094 


8 



Table 1 . Irregular index statistics for p < 39 ■ 2 



Here pi = e _1 / 2 /(2 l i!) is the probability that a Poisson process with mean 1/2 
has i successes. This is motivated by the heuristic (Lehmer, Siegel) that if the 
Bernoulli numerators are random modulo p, then i p is the number of successes in 
p/2 trials each having probability 1/p of success. 
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The five primes of index 7 are 3 238 481, 5 216111, 5 620 861, 9 208 289 and 
32 012 327. 

The Vandiver conjecture was found to be true for all primes tested. The A- 
invariant was equal to i p for all primes; the alternate test mentioned above was 
required for 3 primes. 

For a discussion of heuristics that might apply to Vandiver's conjecture and to 
the value of X p see |Was 97, p. 159] and |Lan90| p. 261]. If (p, k) is an irregular 
pair, Washington examines the consequences of assuming that the probability that 
the cyclotomic unit Uk is a p-th power is 1/p, all independently of each other. 
Using this and heuristics on the distribution of i p , he guesses that the number of 
counterexamples to Vandiver's conjecture for p < x should grow like h log log a;. As 
an exercise in futility, we may try to refine Washington's heuristics using the known 
values of i p , taking into account for example that the first irregular prime (p = 37) 
is unexpectedly large. We find that the expected number of counterexamples up to 
12 million is about 0.674, and that another 0.074 counterexamples were expected 
between 12 million and 163 million (though of course we now know that there 
are none in either case). Many people believe that Vandiver's conjecture is true; 
it also seems reasonable to believe that the conjecture is false but that the first 
counterexample is so astronomically large that it may never be known. Similar 
remarks apply to the A p = i p conjecture. 

As mentioned earlier, the table of irregular pairs, together with check data for 
the Vandiver and cyclotomic invariant verifications, can be found at |Har09aj . 

6. Correctness 

Any computation on this scale is virtually guaranteed to encounter faults of 
various kinds, and this computation was no exception. On one occasion during 
the main Bernoulli number computation, a hardware or operating system failure 
— we could not determine which — resulted in several kilobytes of output being 
overwritten by random data, and it was necessary to repeat the computation for the 
affected primes. On another occasion, the Vandiver and cyclotomic runs located 
subtle memory errors (subsequently confirmed by other tests) on two workstations. 
In comparing our data for those tests, we found a handful of examples (which were 
recalculated) were some sort of write error invalidating a very small number of 
lines. To combat this sort of occurrence, we took careful precautionary measures 
throughout the project, which we now describe. 

The main irregular index computation was far too expensive to run more than 
once. We checked the correctness of this phase by several methods. During the 
main computation itself, for each p we verified the identity 

p-3 

2 fe (fc + l)B k = -4 (mod p) 

(see [BCS921 p. 720]). Already this is a strong consistency check, but it has the 
drawback that it leaves no certificate that can be checked after the computation is 
finished. To increase our confidence in the output, we made our program record 
somewhat more than just the irregular indices. Let N p — min(21ogp, i(p — 3)). 
After computing all B k mod p for a given p, we extracted and stored the N p smallest 
pairs (k,Bk modp), where the pairs are ordered first by B k mod p and then by 
k. In particular, the irregular pairs are listed first, and for the largest primes 
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considered we store about 37 pairs. The resulting compressed file is 1.5 GB, and is 
available for download )Har09a] . We then used a completely independent program 
and hardware to check these recorded pairs. For each k, one may compute mod p 
in 0(p) operations using an identity such as ([1]). We used the highly optimized 
implementation described in [Har08a]. 

This scheme has an additional property we might expect of a 'certificate'. A 
skeptic may randomly select and verify a pair (fc, mod p) from our output file. 
Since presumably the only way to find a k for which B^ mod p is 'small' is to 
compute all the B^ mod p, the skeptic may be persuaded that we must have actually 
performed the full computation for each p. Note that recording all the B^ mod p 
that we computed is impractical, requiring in the order of 10 6 GB of storage. 

For the Vandiver tests, we computed V p ^ twice for each irregular pair and cross- 
checked the results. This was done using programs written independently by the 
two authors, working separately and sharing no code. The programs were run on 
different hardware, using different libraries for the modular arithmetic (zn_poly and 
NTL [ShoOQj l. 

The first run used a new implementation of the method described in [BCE+Ol] , 
updated for 64-bit hardware. For the second run, we used a slightly different 
algorithm. Instead of iterating over c from 1 up to (p — l)/2, we write c = 2 1 g 3 , 
where g is a primitive root modulo p, and iterate over < i < t in an inner loop 
and < j < (p — l)/t in an outer loop, where t is the order of 2 modulo p. For 
each pair (i,j) we only include the multiplicand in the product if the proposed c 
satisfies 1 < c < (p — l)/2. The advantage of this approach is that only a single 
multiplication in Z/pZ is required to update c p ~ 1 ~ k on each iteration, rather than 
the O(logp) operations needed to compute each c p ~ 1_fe had the c been processed 
sequentially. Updating z c under this scheme requires only a single operation in the 
inner loop (since z 1 gl = (z 2 gJ ) 2 ), and slightly more in the outer loop. The latter 
executes infrequently for most p, since t is usually large. 

For the cyclotomic invariants, we again ran everything twice, using indepen- 
dent programs on different hardware. The first run used the implementation from 
BCE + 0l] . and the second was written from scratch using NTL's modular arith- 
metic. 

Earlier computations of irregular pairs and associated data uncovered minor 
problems in their predecessors' data, so it was natural for us to compare our data 
with the results up to 12 million from ten years ago. We found four errors in the 
online tables, and are confident that three of the cases were introduced by an ill- 
advised editing, by hand, of the table to correct cases in which a faulty computer 
had been used, and one of the cases was output of the faulty computer itself. 

7. Hardware 

The aforementioned computations were performed on a number of different ma- 
chines: 

'Lonestar' is a 1300-node cluster at the Texas Advanced Computing Center 
(TACC), running a custom Linux operating system. Each node contains two dual- 
core 2.66 GHz Intel Xeon (Woodcrest) processors and has 8 GB RAM. This ma- 
chine was used for computing irregular indices (about 119 000 core-hours). We used 
various versions of GMP (4.2.1-4.2.4) |Gra08j for the large integer multiplication, 
patched with assembly code written by Jason Martin for the Core 2 architecture. 
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'Ranger' is a 3936-node cluster at TACC, running a custom Linux operating 
system. Each node contains four quad-core 2.3 GHz AMD Opteron (Barcelona) 
processors and has 32 GB RAM. At the time the computations were run, this 
machine was ranked by top500.org as the fourth most powerful supercomputer in 
the world. This machine was used for computing irregular indices (about 68 000 
core-hours). We used the same versions of GMP mentioned above, together with 
assembly code written by the second author and Torbjorn Granlund. This code has 
subsequently been incorporated into public releases of GMP; indeed, the needs of 
the present computation were a large part of the motivation for this work on GMP. 

The first run of the Vandiver and cyclotomic invariant checks was performed on 
a network of 24 desktop machines, each with a 3.4 GHz Pentium 4 CPU (about 
9 000 core-hours). The second run was performed on a 16-core 2.6 GHz Opteron 
server (about 10 000 core-hours). This machine was also used for the verification of 
the Bk modp (about 500 core-hours). 
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